AUTHORS: John Tsiligaridis
Download as PDF
ABSTRACT: This work provides a method for classification using a Support Vector Machine (SVM) via a Decision Tree algorithm and with Vector Quantization. A probabilistic Decision Tree algorithm focusing on large frequency classes (DTPL) is developed. A method for SVM classification (DT_SVM) using Tabu Search (TS) via DTs is developed. In order to reduce the training complexity of the Support Vector Machine (SVM), the DTPL performs partitions that can be treated as clusters. The TS algorithm can provide the ability to approximate the decision boundary of an SVM. Based on DTs, a SVM algorithm is developed to improve the training time of the SVM considering a subset of the cluster’s instances. To reduce the SVM training set size a vector quantization algorithm (the LBG) is used. The LBG classifier is based on Euclidean Distance. Finally, an optimization method, the Simulated Annealing (SA), is applied over the quantization level for discovering of a minimization criterion based on error and low complexity to support the SVM operation. The V_S_SVM can provide lower error at a reasonable computational complexity. A Neural Network (NN) is composed of many neurons that are linked together according to a specific network topology. Main characteristics of SVM and NN are presented. Comparison between NN and SVM with two types of kernels show the superiority of the SVM. The V_S_SVM with RBF kernel can be compared with DT_SVM and provide useful results. Simulation results for all the algorithms with different complexity data sets are provided
KEYWORDS: - SVM, Neural Networks, LBG , Decision Trees , Simulated Annealing, Data Mining
REFERENCES:
[1] J.Han, M.Kamber,J.Pei, Data Mining Concepts and Techniques, Morgan Kaufman,3 ed. 2012.
[2] U.Fayyad, G.Piateski-Shapiro, From Data Mining to Knowledge Discovery, MIT Press 1995.
[3] M.Karntardzic, Data Mining: Concepts, Models, Methods, and Algorithms, IEEE Press,2003
[4] M. Bramer, Principles of Data Mining, Springer-Verlag, London Limited, 2007.
[5] L. Rokach, O. Maimon, Data Mining with Decision Trees: Theories and Applications, Word Scientific ,2008.
[6] F. Glover,”Tabu Search: A tutorial”, https://www.ida.li.se/~zebpe83/heuristic/papers/ TStutorial.pdf
[7] J.Gaast, C.Rietveld, A. Gabor, Y. Zhang, A Tabu search Algorithm for application placement in computer clustering, Computers & Operations Research, Elsevier, 2014, pp:38- 46
[8] Z.Chen, B.Liu, X.He, “ A SVC iterative learning Algorithm based on sample selection for large samples”, 6th Intern. Conference on Machine Learning and Cybernetics, Hong Kong, 2007
[9] H.Yu, J. Yang, J.Han, “ Classifying Large data Sets using SVMs with Hierarchical Clusters”, SIGKDD 2003, Washington DC, USA
[10] S, Haykin, Neural Networks: A comprehensive Foundation, Pearson, 2ed, 2005.
[11] Fausett L. ,Fundamentals of Neural Networks: Architectures, Algorithms, and Applications, Prentice Hall, NJ, 1994
[12] A.Gersho, R. Gray, Vector Quantization and Signal Compression, Kluwer Academic,1991.
[13] W. Steeb, Mathematical Tools in Signal Processing with C++ & Java Simulations, International School for Scientific Computing
[14] https://en.wikipedia.org/wiki/ simulated_annealing
[15] O. Chapelle, V. Vapnik, O. Bousquet, S. Mukherjee, Choosing Multiple Parameters for Support Vector Machines, Machine Learning, 46, 131-159,2002
[16] D. Henderson, S. Jacobson, A. Johnson, The Theory and practice of Simulated Annealing, Handbook of Metaheuristics, Springer, pp.287- 319, 2003
[17] A. Anagnostopoulos, L. Michel, P. Henternryck, Y. Vergados, A simulated annealing approach to the traveling tournament problem, Journal of Scheduling, Springer, Vol. 9, Issue 2, April 2006, pp 177-193